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Abstract 

Finding  similar  items  based  on  co-occurrence  data  is  an  important  data  mining  task  with  applications  ranging 
from  recommender  systems  to  keyword  based  advertising.  A  number  of  co-occurrence  similarity  functions 
have  been  proposed  based  on  graph-theoretic,  geometric,  and  statistical  abstractions.  Despite  the  variety  of 
existing  algorithms,  however,  there  exists  no  formal  methodology  for  analyzing  their  properties  and  com¬ 
paring  their  benefits  and  limitations.  At  the  same  time,  the  wide  range  of  applications  and  domains  where 
co-occurrence-based  similarity  functions  arc  deployed  limits  the  conclusiveness  of  experimental  evaluations 
beyond  the  narrow  task  typically  considered  by  each  method. 

This  paper  proposes  an  axiomatic  approach  to  analyzing  co-occurrence  similarity  functions.  The  approach 
is  based  on  formulating  general,  domain-independent  constraints  that  well-behaved  methods  must  satisfy 
to  avoid  producing  degenerate  results.  Such  constraints  arc  derived  based  on  the  impact  that  continuous 
aggregation  of  the  co-occurrence  data  is  expected  to  have  on  absolute  or  relative  similarity  estimates.  Pro¬ 
posed  constraint-based  analysis  is  applied  to  several  representative,  popular  similarity  functions  and  reveals 
that,  surprisingly,  none  of  them  satisfy  all  constraints  unconditionally.  The  analysis  leads  to  the  design  of 
a  theoretically  well -justified  similarity  function  called  Random  Walk  with  Sink  (RWS).  RWS  is  parameter¬ 
ized  to  satisfy  the  constraints  unconditionally,  with  the  parameterization  having  interesting  probabilistic  and 
graph-theoretic  interpretations. 


1  Introduction 


Co-occurrence  data  is  ubiquitous  in  modern  data  mining  and  machine  learning  applications  as  it  provides  a 
very  rich  signal  source  for  inferring  similarity  between  items,  a  common  prediction  task.  The  following  arc 
examples  of  problems  where  different  types  of  co-occurrences  arc  used  to  identify  related  items: 

1.  Item  recommendation.  Logs  of  consumption  behavior  (market-basket  data)  allow  finding  products  for 
cross-promotion  (e.g.,  in  personalized  recommendations)  [17]. 

2.  Query  suggestion.  Search  engine  logs  associating  queries  with  subsequently  visited  URLs  allow 
identifying  related  queries  based  on  URL  co-visitation  (e.g.,  for  query  suggestion  or  matching  adver¬ 
tisements  on  relevant  keywords)  [5,  14]. 

3.  Related  author  search.  Bibliography  databases  containing  co-authorship  data  or  co-occurrences  of 
publications  in  the  same  venues  allow  finding  similar  authors  (e.g.,  for  finding  related  work,  collabo¬ 
rators  or  qualified  reference  letter  writers)  [18]. 

Because  of  the  wide  applicability  of  co-occurrence  similarity  functions,  a  number  of  them  have  been 
proposed  in  the  context  of  different  domains.  Such  methods  are  roughly  grouped  into  the  following  groups 
based  on  underlying  formalizations: 

1 .  Graph-theoretic  methods  represent  items  as  nodes  in  a  bipartite  graph,  with  occurrence  contexts  being 
opposite  partition  nodes  connected  to  item  nodes  by  edges  representing  occurrences.  Similarity  cor¬ 
responds  to  node  nearness  measures  (e.g.,  probability  of  reaching  another  node  via  a  fc-step  random 
walk). 

2.  Geometric  methods  represent  items  as  vectors  in  a  metric  space  with  occurrences  corresponding  to  di¬ 
mensions.  Similarity  corresponds  to  geometric  measures  of  vector  closeness  (e.g.,  cosine  similarity). 

3.  Probabilistic  methods  represent  items  as  random  events  over  which  contexts  define  probability  distri¬ 
butions,  based  on  which  similarity  is  then  computed  (e.g.,  Pointwise  Mutual  Information). 

It  is  important  to  note  that  these  groups  arc  not  disjoint,  as  a  number  of  methods  can  be  interpreted  as 
belonging  to  more  than  one.  However,  the  bipartite  graph  representation  of  co-occurrence  data  provides  a 
common  underlying  formalism,  and  plays  a  central  role  in  this  paper. 

1.1  Axiomatic  Approach 

Because  similarity  functions  arc  typically  a  component  in  learning  and  mining  applications,  their  experi¬ 
mental  evaluation  is  tied  to  the  specific  task  and  domain  at  hand.  Their  performance  then  depends  on  the 
application  suitability,  making  empirical  evaluations  highly  domain-specific.  Thus,  a  fundamental  unan¬ 
swered  question  remains:  how  do  we  comparatively  analyze  different  co-occurrence-based  similarity  meth¬ 
ods?  This  paper  describes  a  general  framework  that  provides  the  basis  for  such  comparative  analysis.  The 
framework  is  based  on  an  axiomatic  approach',  deriving  fundamental  properties  (axioms)  that  capture  basic 
intuitions  that  any  reasonable  co-occurrence-based  similarity  measure  must  obey.  These  properties  arc  ob¬ 
tained  by  considering  changes  in  similarity  function  output  that  arc  expected  when  new  item  co-occurrences 
arc  observed.  Distinguishing  between  occurrences  arriving  in  new  or  existing  contexts,  as  well  as  between 
absolute  versus  relative  similarity  leads  to  several  types  of  constraints  described  in  the  paper. 
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Axiom 

Method 

A-NCM 

R-NCM 

A-QCM 

R-QCM 

A-TCM 

R-TCM 

DR 

Common  Neighbors 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes 

No 

Cosine 

N(? 

Nof 

Nof 

Yes 

Nof 

Nof 

No1 

Jaccard 

Nof 

Yes 

No 

No 

No 

No 

Nof 

Pointwise  Mutual  Information 

No1' 

Nof 

No 

No 

No 

No 

Nof 

Adamic-Adar 

Yes 

Yes 

No 

No 

No 

No 

No 

Forward  Random  Walk 

“No1 

Yes 

Nof 

Yes 

Yes 

Yes 

Nof 

Backward  Random  Walk 

Nof 

Nof 

Yes 

Yes 

Nof 

Nof 

Nof 

Mean  Meeting  Time 

Nof 

Nof 

Nof 

Nof 

Nof 

Nof 

Nof 

Random  Walk  with  Sink 

Yes* 

Yes 

Yes* 

Yes 

Yes 

Yes 

Yes* 

Table  1:  Summary  of  the  axiomatic  analysis.  ‘Yes’:  a  method  satisfies  an  axiom  on  any  data.  ‘Yes*’:  a 
method  satisfies  an  axiom  on  any  data  given  appropriate  parameter  settings.  ‘Not’:  a  method  satisfies  an 
axiom  only  on  a  specific  set  of  data.  ‘No’:  a  method  does  not  satisfy  an  axiom  on  any  data.  Notice  that 
only  our  proposed  Random  Walk  with  Sink  (RWS)  method,  described  in  Section  5,  satisfies  all  the  axioms 
on  any  data  given  appropriate  parameter  settings. 

1.2  Our  Contributions 

Applying  axiomatic  analysis  to  a  number  of  popular  similarity  functions  yields  surprising  results:  no  single 
method  satisfies  all  constraints  unconditionally,  as  summarized  in  Table  1.  For  example,  Figure  1  (a)  and  (b) 
show  that  the  Forward  Random  Walk  (FRW)  similarity  decreases  after  the  addition  a  new  context.  For  each 
method  and  each  axiom,  we  either  prove  that  the  axiom  is  always  satisfied,  or  identify  the  specific  conditions 
that  lead  to  axiom  dissatisfaction.  The  ultimate  utility  of  the  proposed  analysis  framework  is  that  it  allows 
considering  the  shortcomings  of  current  functions  systematically,  leading  to  derivation  of  their  valiants  that 
overcome  them,  e.g.,  via  data-dependent  parameterization.  This  process  is  demonstrated  by  introducing  a 
new  variant  of  random  walk-based  similarity:  random  walks  with  sink  (RWS).  The  method  has  an  intuitive 
interpretation  that  explains  its  flexibility  related  to  smoothing,  and  it  avoids  axiom  violations  suffered  by 
regular  FRW;  e.g.,  Figure  1  (c)  and  (d)  show  how  RWS  satisfies  an  axiom  which  was  violated  by  FRW. 

The  axiomatic  approach  has  been  previously  applied  to  clustering  and  information  retrieval  functions  [10, 
7,1],  leading  to  their  better  understanding,  and  our  results  indicate  that  it  can  be  equally  fruitful  for  analyzing 
co-occurrence  similarity  functions.  The  primary  contributions  of  the  paper  are  the  following: 

1 .  Axiomatic  framework.  We  propose  a  principled  methodology  for  analyzing  co-occurrence  similarity 
functions  based  on  differential  response  to  new  observations. 

2.  Analysis  and  proofs.  We  analyze  a  number  of  commonly  used  similarity  functions  using  our  pro¬ 
posed  abstraction  called  context-wise  decomposition,  and  derive  the  conditions  under  which  they 
satisfy  the  axioms.  We  prove  that  no  single  method  satisfies  all  conditions  unconditionally. 

3.  Design  of  new  similarity  function.  We  demonstrate  how  axiomatic  analysis  allows  designing  a  new 
data-driven,  theoretically  we  11 -justified  co-occurrence  similarity  function  without  degenerate  proper¬ 
ties. 

The  rest  of  this  paper  is  organized  as  follows.  Section  2  surveys  commonly  used  methods  for  computing 
similarities  in  bipartite  graphs.  Section  3  introduces  the  axiomatic  framework,  defining  desirable  properties 
expected  of  well-behaved  similarity  functions,  followed  by  Section  4  where  the  framework  is  applied  to 
analyze  the  popular  similarity  functions.  Based  on  the  analysis,  we  propose  a  new  similarity  function  which 
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Figure  1 :  The  comparison  of  the  Forward  Random  Walk  (FRW)  and  our  proposed  Random  Walk  with  Sink 
(RWS)  method.  In  (a)  and  (b),  the  new  context  ‘CIKM’  decreases  the  FRW  similarity  between  Jian  Pei  and 
Ravi  Kumar,  which  is  counterintuitive  and  violating  the  Axiom  1  ‘New  Context  Monotonicity’.  In  (c)  and 
(d),  the  same  context  increases  the  RWS  similarity  with  the  parameter  s  =  100,  showing  that  RWS  satisfies 
the  Axiom  1 . 


satisfies  all  the  proposed  properties  in  Section  5.  We  summarize  related  work  in  Section  6  and  conclude  in 
Section  7. 


2  Similarity  Functions  for  Co-occurrence  Data 


In  this  section,  we  describe  several  highly  popular  co-occurrence  similarity  functions  and  briefly  discuss 
their  properties.  All  considered  functions  compute  the  similarity  between  a  query  item  q  and  a  target 
item  u  €  T  \  { q } ,  where  T  is  the  set  of  all  items.  Items  can  be  represented  as  nodes  on  one  side  of  a 
bipartite  graph,  with  the  set  of  contexts,  C,  in  which  items  arc  observed,  represented  by  nodes  on  the  other 
side  of  the  bipartite  graph.  Graph  edges  encode  occurrences,  and  may  be  unweighted  or  weighted,  where 
weights  would  represent  occurrence  properties,  e.g.,  occurrence  count.  For  a  dataset  with  n  items  and  m 
contexts,  the  graph  corresponds  to  an  n  x  in  adjacency  matrix  W  where  its  (i,j) th  element  Wiq  represents 
the  occurrence  weight  of  i- th  item  in  j-th  context.  Table  2  summarizes  the  notation. 

Common  Neighbors  (CN)  method  computes  the  similarity  of  two  nodes  as  the  total  number  of  their 
common  neighbors,  and  has  been  extensively  used  in  social  network  graphs  [15].  In  a  co-occurrence  graph, 
this  corresponds  to  computing  the  dot  product  of  the  vectors  representing  the  neighborhoods  of  the  two 
nodes.  Let  T (q)  and  Ff  u)  be  the  sets  of  nodes  connected  to  nodes  q  and  u,  respectively.  Then,  the  common 
neighbor  similarity  CN\y{q ,  u)  of  a  target  node  u  to  the  query  q  based  on  weight  matrix  W  is: 

CNw(q,  u)  =  J2  WncWuc. 
cer(g)nr(u) 


Cosine  (COS)  similarity  normalizes  Common  Neighbors  by  the  total  number  of  contexts  in  which  the 
query  and  target  items  are  observed,  and  is  especially  popular  for  computing  textual  similarity  in  information 
retrieval  applications  [13].  Formally,  the  similarity  COSw{q,  u)  of  a  target  node  u  to  the  query  q  based  on 
weight  matrix  W  is: 


COSw(q,  it) 


E 

cer(g)nr(it) 


WqcWuc 

wq.\\2\\wu,\W 


where  Wq:  and  Wu:  are  the  qth  and  uth  row  of  the  W  matrix,  respectively. 
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Symbol 

Definition 

q 

Query  item  with  respect  to  which  similarities  of  other  items  arc  computed. 

T 

Set  of  items  similarity  between  which  is  computed. 

n 

Number  of  items,  n  =  T  . 

C 

Set  of  contexts  in  which  items  occurrences  are  observed. 

rn 

Number  of  observation  contexts,  m  =  \C\. 

W 

n  X  m  graph  adjacency  matrix. 

Wii 

(i,  j) th  element  of  W,  meaning  the  occurrence  weight  for  z-th  item  in  j-th  context. 

Wi, 

Row  vector  containing  the  /th  row  of  W,  meaning  the  occurrence  weights  of  all  contexts  for 
item  i. 

Q 

n  x  n  diagonal  degree  matrix  with  Q%%  =  J2k  W ifc • 

D 

m  x  m  diagonal  degree  matrix  with  Djj  =  Ylk  W kj ■ 

fw(q,u) 

Similarity  of  item  u  to  query  item  q  computed  via  function  /  based  on  weight  matrix  W. 

r(«) 

Set  of  graph  nodes  connected  to  node  u. 

Table  2:  Table  of  symbols. 


Jaccard  (JAC)  coefficient  measures  the  similarity  of  two  sets  as  the  size  of  their  intersection  scaled 
by  the  size  of  the  union,  ignoring  the  occurrence  weights  in  contrast  to  Common  Neighbors  and  cosine 
similarity  [13].  .lace aid  similarity  score  J  AC \y(q.  u)  of  a  target  node  u  to  the  query  node  q  based  on  the 
weight  matrix  W  is: 


J AC\y{q,  u ) 


|T(g)  n  T(u)\ 

|T(g)ur(«)|' 


Pointwise  Mutual  Information  (PMI)  has  been  used  extensively  in  computational  linguistics  and  in¬ 
formation  retrieval  literature  for  similarity  calculation  between  terms  [19],  modeling  them  as  outcomes  of 
random  variables  for  different  contexts.  PMI  of  two  events  q  and  u  is  defines  as  log >  where  p(q), 
p(u),  and  j)(q,  u)  are  the  probabilities  that  the  events  q,  u,  and  ( q ,  u)  are  observed,  respectively.  In  the 
co-occurrence  data  setting  we  arc  considering,  probability  p(u)  for  an  item  u  6  T  is  defined  as  the  prob¬ 
ability  that  it  has  been  observed  in  a  randomly  selected  context  c  G  C,  where  C  is  the  set  of  all  contexts: 
p(u)  =  ,  where  Tlu)  is  the  set  of  contexts  in  which  u  occurred.  Then,  PMI  similarity  PM I\v((l-  u)  of 

a  target  u  to  the  query  q  based  on  weight  matrix  W  is: 


r^rr  ,  ^  ,  |T(q)  Cl  T(n)|  |r(g)  nr(u)| 

PMIw{q,u)  =  log  \C\  oc 


|r(?)||r(«)| 


|r(?)||r(«)| 


(i) 


We  use  the  final  term  of  Equation  (1)  for  the  definition  of  the  PMI  similarity  score. 

Adamic -Adar  (AA)  method  measures  the  similarity  of  two  nodes  by  aggregating  the  importance  score 
of  the  common  contexts  between  them  [2],  The  score  of  a  common  context  is  reciprocal  to  the  number  of 
item  occurrences  in  it,  on  log  scale.  Formally,  the  Adamic- Adar  score  AA\y(q.  u )  of  a  target  node  u  to  the 
query  node  q  based  on  the  weight  matrix  W  is: 


AAw(q,u ) 


E 

cer((j)nr(u) 


1 

log\T{c)\ ' 


Forward  Random  Walk  (FRW)  method  models  the  similarity  as  the  probability  of  a  random  walk 
that  started  from  the  query  node  arriving  at  the  target  node  after  a  specified  number  of  steps  [6],  Imagine  a 
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random  walk  on  the  graph  stalling  from  the  node  q.  The  probability  score  FRWw{q ,  u)  of  the  walk  arriving 
at  node  u  in  2  steps  based  on  the  weight  matrix  W  is: 

F RWw(q,  u)  =  [Q~1WD~1WT}qu  = 

* °lqq  Dcc 

where  both  Q  and  D  arc  the  diagonal  degree  matrices:  Qn  =  ^2k  W^,  Djj  =  ^2k  Wkj. 

Backward  Random  Walk  (BRW)  method  computes  similarity  as  the  probability  of  a  random  walk 
stalling  from  target  node  u  and  arriving  at  the  query  node  q,  thus  traversing  in  reverse  direction  compared  to 
the  forward  walk  [6].  The  probability  BRWw{q ,  u)  that  the  walk  arrives  at  item  q  in  2  steps  based  on  the 
weight  matrix  W  is: 

BRWw(q,u)  =  [WD-lWTQ-\u  =  £ 

c  uu  Dcc 

Mean  Meeting  Time  (MMT)  method  computes  similarity  based  on  two  independent  random  walks 
that  stall  from  the  query  and  the  target  nodes,  respectively  [9].  MMT  is  defined  as  the  one-step  meeting 
probability  that  the  two  walks  arrive  in  a  shared  context:  MMT\y(q ,  u)  of  a  target  node  u  to  the  query  q 
based  on  the  weight  matrix  W  is 

MMTw(q,u )  =  [Q-1W(Q~1W)T}qu  =  £ 

^cqq  ^ luu 

Mean  meeting  time  can  be  more  generally  defined  as  the  expectation  of  the  minimum  number  of  steps 
needed  for  the  two  random  walks  to  meet  in  a  common  node,  while  forward  and  backward  random  walks 
can  be  more  generally  considered  with  a  number  of  steps  that  is  greater  than  two.  In  this  paper,  we  focus 
on  the  basic  cases  of  two-step  forward  and  backward  walks,  and  one-step  mean  meeting  time,  leaving  the 
analysis  of  multi-step  walks  for  future  work. 

3  Similarity  Axioms 

What  makes  each  of  the  method  described  in  Section  2  suitable  or  unsuitable  for  a  particular  application? 
To  answer  this  question,  we  need  to  define  fundamental  characteristics  that  are  desirable  of  co-occurrence 
similarity  functions.  This  section  describes  such  characteristics  and  formalizes  them  as  axioms  based  on  the 
bipartite  graph  representation.  The  setting  for  these  axioms  captures  the  real-world  phenomena  underlying 
the  continuous  aggregation  of  new  co-occurrence  data:  new  papers  being  published,  users  issuing  new 
queries  or  making  purchases,  articles  being  edited  in  Wikipedia.  In  this  setting,  each  axiom  corresponds  to 
the  effects  that  the  arrival  of  new  observations  is  expected  to  have  on  similarity  function  output. 

There  arc  two  primary  scenarios  for  the  use  of  similarity  functions  in  applications:  selection  of  k  most 
similar  items  (nearest  neighbors),  and  selection  of  all  near  neighbors  whose  similarity  to  the  query  item  is 
greater  than  some  threshold  R  (neighborhood  radius;  e.g.  see  [3]).  We  will  refer  to  these  two  scenarios  as 
kNN  and  RNN  selection1.  The  two  scenarios  give  rise  to  different  constraints  because  kNN  selection  is  pri¬ 
marily  concerned  with  the  correctness  of  ranking  target  items,  while  RNN  selection  is  primarily  concerned 
with  the  accuracy  of  similarity  estimation  for  the  target  items.  The  two  corresponding  types  of  axioms  can 
then  be  formulated  as  relative  constraints  (concerning  potential  changes  in  ranking  of  a  target  item  with 
respect  to  other  target  items),  and  absolute  constraints  (concerning  potential  changes  in  the  actual  similarity 
value  for  a  target  item). 

In  the  following,  we  formally  define  a  basic  set  of  relative  and  absolute  axioms  which  capture  the  intu¬ 
itions  of  our  expectation  on  the  new  observations  of  co-occurrence  data. 

'in  practical  applications,  kNN  and  RNN  selection  are  often  hybridized. 


5 


ArM 

A  -  A 

< 

t 

< 

Cl  II  </  u 

(/  II  1]  u 

q  ii  q  u 

W  W 

W  W 

w  w 

(a)  New  Context 

(b)  Query  Co-occurrence 

(c)  Target  Co-occurrence 

Monotonicity 

Monotonicity 

Monotonicity 

q  II  q  II  1 1  II 


w  w  w 

(d)  Diminishing  Returns 


Figure  2:  (a)  New  Context  Monotonicity  (NCM):  joint  observations  in  a  new  shared  context  increase  simi¬ 
larity.  (b)  Query  Co-occurrence  Monotonicity  (QCM):  new  query  observations  in  a  shared  context  increase 
similarity,  (c)  Target  Co-occurrence  Monotonicity  (TCM):  new  target  observations  in  a  shared  context  in¬ 
crease  similarity,  (d)  Diminishing  Returns  (DR):  incremental  similarity  gains  due  to  subsequent  observations 
decrease  as  items  share  more  contexts. 


3.1  Monotonicity 

The  first  group  of  axioms  encodes  the  intuition  that  new  observations  of  query  and  target  items  in  a  shared 
context  provide  additional  evidence  of  association  between  them.  This  additional  evidence  implies  that  the 
result  of  the  observations  must  be  an  increase  in  similarity  of  the  target  item  to  the  query  (formulated  via  the 
absolute  axioms),  while  the  ranking  of  the  target  node  cannot  decrease  (formulated  via  the  relative  axioms). 

We  first  consider  a  scenario  where  a  new  context  containing  both  the  query  and  target  items  appears, 
corresponding  to  the  arrival  of  a  new  context  node  in  the  bipartite  graph,  as  shown  in  Figure  2  (a).  Let 
fw(<h  u)  be  the  similarity  of  item  u  to  query  item  q  computed  via  function  /  based  on  weight  matrix  W. 
The  first  axiom  is  defined  as  follows. 

Property  1  (New  Context  Monotonicity )  Let  a  new  context  c  be  observed  with  occurrences  Wqc  of  the 
query  item  q  and  WUc  of  the  target  item  u.  Let  W  be  the  original  co-occurrence  matrix,  and  W  be  the 
matrix  after  the  addition  of  the  new  context.  Then,  a  well-behaved  co-occurrence  similarity  function  must 
satisfy  the  following  constraints: 

Absolute  New  Context  Monotonicity  (A-NCM):  f\v(q,  u )  >  fw((L  '«)• 

Relative  New  Context  Monotonicity  (R-NCM):  fw(q,  u)  >  fw(q-  v),  \/v  s.t.  fwiq.  u)  >  fwiq-  v). 

The  next  two  monotonicity  axioms  capture  the  expected  response  of  the  similarity  function  to  the  arrival 
of  new  observations  in  existing  contexts  where  co-occurrences  of  target  and  query  items  were  observed 
previously.  In  the  bipartite  graph  representation,  such  observations  correspond  to  an  increase  in  the  weight 
of  an  existing  edge  connecting  items  to  the  shared  context,  as  shown  in  Figure  2  (b)  and  (c). 

Property  2  (Query  Co-occurrence  Monotonicity )  Let  a  new  occurrence  of  query  item  q  with  weight  e  be 
observed  in  a  context  c  where  the  target  item  has  also  been  observed.  Let  W  be  the  original  co-occurrence 
matrix,  and  W  be  the  matrix  after  the  new  query  item  observation,  differing  from  W  in  a  single  element: 
Wqi-  =  Wqc  +  e.  Then,  a  well-behaved  co-occurrence  similarity  function  must  satisfy  the  following  con¬ 
straints: 

Absolute  Query  Co-occurrence  Monotonicity  (A-QCM):  j\v(q.  u)  >  fwiq-  '«)• 

Relative  Query  Co-occurrence  Monotonicity  (R-QCM):  fw(q.  u)  >  fwiq.  v),  Mv  s.t.  fwiq-  u)  > 

fw(q,v)- 
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Property  3  (Target  Co-occurrence  Monotonicity)  Let  a  new  occurrence  of  target  item  u  with  weight  e  be 
observed  in  a  context  c  where  the  query  item  has  also  been  observed.  Let  W  be  the  original  co-occurrence 
matrix,  and  W  be  the  matrix  after  the  new  target  item  observation,  differing  from  W  in  a  single  element: 
Wu£  =  Wu£  +  e.  Then,  a  well-behaved  co-occurrence  similarity  function  must  satisfy  the  following  con¬ 
straints: 

Absolute  Target  Co-occurrence  Monotonicity  (A-TCM):  fw(q,  u)  >  fwUl-  u). 

Relative  Target  Co-occurrence  Monotonicity  (R-TCM):  fw(q.  u)  >  f\VUh  Vi;  s.t.  fw(q-  u)  > 
fw(q,v). 

Overall,  the  monotonicity  axioms  guarantee  that  additional  observations  of  the  query  and  the  target  items 
in  a  shared  context,  either  new  or  previously  seen,  imply  a  stronger  degree  of  association  between  them  and 
hence  must  result  in  a  higher  similarity  estimate  while  not  lowering  the  ranking  of  the  target  node. 

3.2  Diminishing  Returns 

Next,  we  consider  the  rate  of  similarity  increase,  that  is,  the  relative  size  of  incremental  similarity  gains  as 
more  and  more  contexts  arc  observed  in  which  the  query  and  the  target  items  co-occur,  as  shown  in  Figure  2 
(d).  The  diminishing  returns  axiom  postulates  that  the  relative  impact  of  new  co-occurrences  must  decline 
as  more  of  them  arc  observed.  Intuitively,  this  property  captures  the  process  of  continuously  obtaining  i.i.d. 
data  occurrences  from  an  underlying  distribution  that  continues  to  generate  new  contexts  (e.g.,  new  user 
sessions  in  which  queries  co-occur,  or  new  venues  where  researchers  publish  papers).  Diminishing  returns 
guarantees  that  the  novelty  of  the  new  occurrence,  conditioned  on  the  previous  occurrences  of  the  same  data, 
diminishes. 

Property  4  (Diminishing  Returns  (DR))  Let  W  be  the  current  weight  matrix  where  query  item  q  and  target 
item  u  have  been  observed  to  co-occur.  Let  W  be  the  weight  matrix  resulting  from  addition  of  a  new  context  c 
in  which  q  and  u  co-occur :  Let  W  be  the  weight  matrix  resulting  from  subsequent  addition  of  a  new  context 
node  c  in  which  q  and  u  co-occur:  Without  loss  of  generality,  assume  all  edges  connecting  q,  u,  c  and  c 
have  the  equal  weight  9.  Then,  a  well-behaved  similarity  function  must  satisfy  the  following  constraint: 
fw (9»  “)  ~  fw(q,  u)  >  fw(q,  u)  -  ffr(q,  u). 

4  Formal  Analysis 

In  this  section,  we  examine  the  compliance  of  each  similarity  function  described  in  Section  2  with  the  axioms 
defined  in  Section  3.  The  analysis  is  simplified  by  introducing  a  unifying  additive  abstraction  -  context-wise 
decomposition  -  via  which  all  of  the  considered  similarity  functions  can  be  represented. 

4.1  Unifying  Framework  for  Similarity  Functions 

We  observe  that  all  the  similarity  functions  in  Section  2,  although  seemingly  different,  can  be  unified  into 
our  proposed  abstraction  called  context-wise  decomposition.  Let  us  define  the  evidence  score  e\v{c,  q.  u )  to 
be  the  context  c’s  direct  contribution  in  computing  the  similarity  of  q  and  u  in  a  similarity  function  fw(q ,  u). 
Then,  each  of  the  function  in  Section  2  is  represented  by 

fw(q,u)=  ^2  ew(c,q,u),  (2) 

cer(g)nr(«) 
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Method 

Evidence 

eW(c,  q,  u) 

Method 

Evidence 

ew(c,q,u) 

CN 

wqcwuc 

AA 

1 

ioff|r(c)| 

COS 

wqcwu  c 

FRW 

Wqc  Wuc 

||W,:||a||W„:||2 

£,■  Ei  wic 

JAC 

1 

BRW 

Wuc  Wqc 

|r(?)ur(u)| 

£,•  wuj  Ei 

PMI 

1 

MMT 

Wqc  Wuc 

|r(«)||r(u)| 

£  ;  Wqj  Ei  Wui 

Table  3:  Evidence  score  functions  for  different  similarity  calculation  methods.  These  functions  arc  plugged 
into  Equation  (2)  to  define  similarity  methods. 

meaning  that  the  similarity  score  is  the  sum  of  contexts’  evidence  scores.  Table  3  lists  the  evidence  score 
functions  for  the  similarity  methods  introduced  in  Section  2.  As  we  will  see  later  in  this  section,  this  unified 
abstraction  eases  the  formal  analysis  of  functions  with  regal'd  to  the  axioms. 

Table  1  summarizes  the  formal  analysis  of  these  methods  based  on  the  axioms,  indicating  which  axioms 
are  satisfied  by  which  similarity  functions  unconditionally,  and  which  do  so  only  under  certain  constraints. 
The  following  subsections  summarize  and  provide  brief  intuitions  for  each  result,  particularly  focusing  on 
cases  whether  the  axioms  are  not  satisfied  under  certain  conditions.  Full  proofs  are  provided  in  the  Appendix. 


Method 

A-NCM 

R-NCM 

CN 

Always  Yes 

Always  Yes 

COS 

WqdWue+j:cWqcWuc  ^  .A 

WqaWuq+i:cWqcWuc  ^ 

^\\wq..\\l+w^s/\\wu..\\l+wl. 

\\Wq:\\W\\wl:\\i+W^ 

JAC 

1  >  fw{q,u ) 

Always  Yes 

PMI 

|r(«)l+|r(«)|+i  >  fw{q,u) 

pAy  >  fw(q,u) 

AA 

Always  Yes 

Always  Yes 

FRW 

wl+k.,  >  fw(q’u ) 

Always  Yes 

BRW 

wae+ke  >  fwiq’u) 

wae+kt  >  fw{q’u) 

MMT 

WqtWuc  >  (A  -  1)  WqcWuc 

WqcWuS  >  (7 WqcWvc  -  WqcWuc) 

Table  4:  Summary  of  sufficient  and  necessary  conditions  for  similarity  functions  to  satisfy  New  Context 
Monotonicity.  In  MMT,  A  =  and  7  = 

W  qqW  uu  ^cvv 


4.2  Analysis  for  New  Context  Monotonicity 

In  New  Context  Monotonicity  axioms,  a  new  context  c  is  observed  containing  occurrences  of  target  item  u 
and  query  item  q.  Let  fw((h  u)  and  ,/q  - ( q,  u)  be  the  similarity  of  u  to  q  before  and  after  observing  the  new 
context,  respectively,  based  on  corresponding  similarity  matrices  W  and  W,  respectively.  Analogously,  let 


fw(q , v)  and  v )  be  the  similarities  of  another  target  item  v  with  respect  to  query  q  before  and  after 

the  new  context  observation,  respectively.  Then,  based  on  Equation  (2),  the  four  scores  can  be  written  as: 

fw(q,u)=  ^2  ew(c,q,u),  (3) 

cer((j)nr(«),c^c 

fw^u)  =  ew^^u)  +  ew(c’3S«),  (4) 

cer(g)nr(u),c^c 


fw(q,v ) 

=  ^2  ew(c,q,v), 

(5) 

cer(q-)nr»,c^c 

fwfav) 

=  ew^cq,v)- 

(6) 

cer(g)nr»,c^c 

4.2.1  Absolute  New  Context  Monotonicity  (A-NCM) 

We  first  provide  a  sufficient  condition  for  A-NCM  to  hold. 

Lemma  1  Similarity  function  f  satisfies  A-NCM  if  evidence  for  all  contexts  observed  before  c  is  not  changed 
by  observation  of  c:  ew (c,  q,  u )  =  e^(c,  q,  it),  Vc  c. 

Proof  1  If  ew(c,q,u)  =  e^(c,q,u),\/c  7^  c,  and  ew{c,q,u)  >0 ,\/q,Vu,VW,  then 

fw(q, u )  =  ery(o  9,  u)  +  iZc&v(q)nV{u),cAc  ew(c>  h, «)  =  e^(c,  q,  it)  +  fw(q, «)  >  /w(q,  «)• 


Method 

A-QCM 

R-QCM 

CN 

Always  Yes 

Always  Yes 

COS 

ewu£+zcwqcwuc  ,  V 

f\\Wq:\\l+de+2WqS)\\Wu:\\2  >  }W{Ql  ' 

Always  Yes 

JAC 

Always  No 

Always  No 

PMI 

Always  No 

Always  No 

AA 

Always  No 

Always  No 

FRW 

f<  1  (W  -  W1cWu£s 

fw(q,u)  uc  D££  '  ^cc 

Always  Yes 

BRW 

Always  Yes 

Always  Yes 

MMT 

\Wq£+e  Wq£~ 

Qqq~\~€.  Qqq 

WUc  \  e  Wqc  Wuc 

Quu  Qqq  ~\~€  Qqq  Q uu 

e>-W9a+fe(E^a^c(^-^)) 

Table  5:  Summary  of  sufficient  and  necessary  conditions  for  similarity  functions  to  satisfy  Query  Co¬ 
occurrence  Monotonicity,  e  is  the  additional  edge  weight  as  described  in  Axiom  2  at  Section  3. 

The  lemma  effectively  states  that  if  the  addition  of  a  new  context  c  does  not  affect  the  evidence  scores 
for  existing  contexts,  similarity  is  guaranteed  to  increase  (as  long  as  evidence  from  the  new  context  is  pos¬ 
itive).  Of  the  similarity  functions  above,  only  Common  Neighbors  and  Adamic-Adar  satisfy  the  condition 
in  Lemma  1 ,  and  hence  unconditionally  satisfy  the  A-NCM  axioms.  Other  methods  only  satisfy  A-NCM 
conditionally,  with  Table  4  summarizing  the  conditions  for  each  one.  We  note  that  all  the  conditions  from 
Table  4  to  7  are  both  sufficient  and  necessary. 
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4.2.2  Relative  New  Context  Monotonicity  (R-NCM) 


Next,  we  provide  a  sufficient  condition  for  R-NCM  to  hold,  in  Lemma  2.  It  effectively  states  that  R-NCM 
holds  for  all  similarity  functions  which  maintain  the  ranking  of  nodes  after  the  new  context  addition  without 
accounting  for  the  additional  evidence  score  yielded  by  the  new  context. 

Lemma  2  If  a  similarity  function  f  satisfies  /1p( q ,  u)  —  e^(c,  q,  u )  >  f^(q,  v),  and  e^(c,  q,  u )  >  0,  then 
f  satisfies  R-NCM. 

Proof  2  f^(q,  u )  =  (/^(q,  u )  -  e^(c,  q,  u))  +  e^(c,  q,  u)  >  f^(q,  v )  +  e^(c,  q,  u )  >  fw(q,  v).  Thus, 

fw(q,u)  >  fw(q,v)- 

It  can  be  checked  that  Common  Neighbors,  Adamic-Adar,  and  Forward  Random  Walk  satisfy  the  con¬ 
dition  in  Lemma  2.  Jaccard  similarity  does  not  satisfy  the  condition,  but  it  can  nonetheless  be  shown  to 
always  satisfy  R-NCM.  Other  methods  only  satisfy  R-NCM  conditionally.  Table  4  provides  a  summary  of 
the  conditions  for  the  different  similarity  functions  to  satisfy  NCM. 

4.3  Analysis  for  Query  Co-occurrence  Monotonicity 

Let  fw(q, u)  and  f\'v(q,  u)  be  the  similarity  scores  of  target  item  u  to  query  item  q  before  and  after  the  new 
query  observation,  respectively.  Using  the  evidence  score  decomposition  in  Equation  (2),  the  two  scores  can 
be  written  as: 


fw(q,u) 

=  ew(c,q,u)  +  ^2  ew{c,q,u), 

(V) 

cer(g)nr(w),c^c 

fw(<hu) 

=  e^(c,q,u)+  ew^^u)- 

(8) 

cer(g)nr(u),c^c 

For  the  similarities  of  target  v  to  query  q,  fw(q.  v)  and  .f\y(q,  v),  we  utilize  the  expressions  in  Equa¬ 
tions  (5)  and  (6). 

4.3.1  Absolute  Query  Co-occurrence  Monotonicity  (A-QCM) 

A  sufficient  condition  to  satisfy  A-QCM  is  described  in  Lemma  3.  It  effectively  states  that  when  more  query 
item  occurrences  arc  observed  in  a  context  already  shared  with  a  target  item,  similarity  of  the  target  item 
will  increase  as  long  as  the  corresponding  evidence  score  increases,  and  the  evidence  scores  for  other  shared 
contexts  between  the  query  and  the  target  item  don’t  decrease. 

Lemma  3  If  the  evidence  score  function  e\\-  of  a  similarity  function  f  satisfies  c.\v(c,  q.  u )  <  e^(c,  q,  u), 
and  ew(c,  q,  u )  <  ^CcC*  ew(c’  (T  u)>  where  C*  =  {c|c  £  T(q)  n  T(u),  and  c  /  c},  then  f  satisfies 

A-QCM. 


Proof  3  Directly  follows  from  applying  Equations  (5)-(8)  to  the  condition. 

It  can  be  checked  that  Common  Neighbors  and  Backward  Random  Walk  meet  the  condition  in  Lemma  3, 
and  thus  satisfy  A-QCM  unconditionally.  Jaccard  similarity,  PMI,  and  Adamic-Adar  don’t  satisfy  A-QCM, 
as  they  don’t  take  into  consideration  the  edge  weights  in  their  computation.  Cosine  similarity,  Forward 
Random  Walk  and  Mean  Meeting  Time  satisfy  A-QCM  conditionally,  with  the  conditions  for  each  one 
summarized  in  Table  5. 
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Method 

A-TCM 

R-TCM 

CN 

Always  Yes 

Always  Yes 

COS 

eWqe+ZcWqcWua  ^  ^ 

eWqe+J2rWqcWuc  ^  f 

\\Wq..\\2Q\\W„..\\l+e(e+2W„a)  > 

\\Wq:\\2Q\\Wu:\\l+e(e+2Wue)  ’ 

JAC 

Always  No 

Always  No 

PMI 

Always  No 

Always  No 

AA 

Always  No 

Always  No 

FRW 

Always  Yes 

Always  Yes 

BRW 

t<fwLut  C^a  W^)  A* 

^  WCn  D&& 

MMT 

1"  W„fi+e  w„e.  1  WQC  ^  e  sr  Wuc  Wqo 

Quu~\~£  Quu  Qqq  ^  Quu~\~ e  C^C.  Quu  Q qq 

Table  6:  Summary  of  sufficient  and  necessary  conditions  for  similarity  functions  to  satisfy  Target  Co¬ 
occurrence  Monotonicity,  e  is  the  additional  edge  weight  as  described  in  Axiom  3  at  Section  3. 


4.3.2  Relative  Query  Co-occurrence  Monotonicity  (R-QCM) 

Next,  we  describe  a  sufficient  condition  for  the  axiom  R-QCM  to  hold,  in  Lemma  4. 


Lemma  4  Let  q  be  a  query  node,  and  a  and  v  are  target  nodes  with  f\v(q.  u)  >  fw((l-  v).  lfa  similarity 
function  f  satisfies  the  following  condition  under  new  appearances  of  q  in  context  c  shared  with  u  but  not 
with  v,  it  preserves  their  ranking  and  hence  satisfies  axiom  R-QCM: 

Sw^'v)  =  ^cer(q)ni»,c^a  ew^c’1’u)  e^(c,q,u) 

fw(qr)  'Ecer(q)nr(u),c?cew(c,q,u)  ^  ew(c,q,u)- 


Proof  4  Let  C*  =  {c\c  6  T(g)  Cl  T(n),  and  c  c}.  Then, 
fwihi  u )  —  ew  (c,  q,  u)  +  J2C£C*  e-^{c,q,u) 

IS  Ztcq^)ew^  q ’  u">  +  ^ceC*  ew(c  q,  u) 


> 


=  eS*  e*SS  q,  u)  +  EcgC*  ew(c,q,u )) 


,(c,q,u) 


> 


E 


Cgc*  cw 


ewQ^)fw(q,v)  =  fwM 


EcgC 

Thus,  fw(q,u)  >  fw(q,v). 


It  can  be  checked  that  Common  Neighbors,  Cosine  similarity,  Forward  and  Backward  Random  Walk 
satisfy  the  condition  in  Lemma  4,  and  thus  satisfy  axiom  R-QCM.  Jaccard  Similarity,  PMI,  and  Adamic  - 
Adar  do  not  satisfy  the  axiom  due  to  their  ignorance  of  edge  weights  and  hence  of  any  new  item  observations 
in  contexts  where  they  have  been  seen  previously.  Mean  Meeting  Time  satisfies  R-QCM  conditionally  as 
summarized  in  Table  5. 


4.4  Analysis  of  Target  Co-occurrence  Monotonicity 

Analogously  to  the  previous  section,  the  evidence  score  decomposition  in  Equations  (5)-(8)  allows  us  to 
examine  the  compliance  of  similarity  functions  with  the  absolute  and  relative  constraints  when  new  obser¬ 
vations  of  a  target  item  u  are  seen  in  a  context  c  shared  with  the  query  item  q. 
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4.4.1  Absolute  Target  Co-occurrence  Monotonicity  (A-TCM) 

The  sufficient  condition  for  the  axiom  A-TCM  can  be  stated  analogously  to  that  of  A-QCM:  when  more 
target  occurrences  arc  observed  in  a  context  already  shared  with  the  query  item,  similarity  of  the  target  item 
will  increase  as  long  as  the  evidence  score  for  c  increases,  and  the  evidence  scores  for  other  shared  contexts 
between  the  query  and  the  target  item  don’t  decrease. 

Lemma  5  If  the  evidence  score  function  ew  of  a  similarity  function  f  satisfies  ew(c,  q,  u)  <  e^(c,  q,  u), 
and  YIcgC*  ew{c,  q,  u)  <  X^ceC*  eyp(c)  u)>  w^ere  C*  =  {c|c  G  T(q)  nT(tt),  and  c  /  c},  then  f  satisfies 

A-TCM. 

Proof  5  The  proof  is  analogous  to  that  of  Lemma  3,  and  we  omit  it  for  brevity. 

It  can  be  checked  that  Common  Neighbors  and  Forward  Random  Walk  fulfill  the  condition  in  Lemma  5, 
and  thus  satisfy  axiom  A-TCM.  Jaccard  similarity,  PMI,  and  Adamic-Adar  don’t  satisfy  A-TCM  as  they 
ignore  the  edge  weights  and  hence  any  additional  occurrences  in  previous  contexts.  Cosine  similarity,  Back¬ 
ward  Random  Walk  and  Mean  Meeting  Time  satisfy  A-TCM  conditionally  as  summarized  in  Table  6. 

4.4.2  Relative  Target  Co-occurrence  Monotonicity  (R-TCM) 

Next,  we  provide  a  sufficient  condition  for  R-TCM,  which  is  analogous  to  that  of  R-QCM. 

Lemma  6  Let  q  be  a  query  node,  and  u  and  v  are  target  nodes  with  fw(q ,  u)  >  fw(q >  v)-  lfa  similarity 
function  f  satisfies  the  following  condition  under  new  appearances  of  q  in  context  c  shared  with  u  but  not 
with  v,  it  preserves  their  ranking  and  hence  satisfies  axiom  R-TCM: 

fwiiw)  _  T,cer(q)nr(u),c^c  ew(c’V’u'>  <  e-w{c,q,u) 
fw(qw)  Scgr(q)nr(u),c^c  ew{°,q,u)  ~~  ew{c,q,u)' 

Proof  6  The  proof  is  analogous  to  that  of  Lemma  4,  and  we  omit  it  for  brevity. 

It  can  be  checked  that  Common  Neighbors  and  Forward  Random  Walk  satisfy  the  condition  in  Lemma  6 
and  thus  satisfy  R-TCM  unconditionally.  Jaccard  similarity,  PMI,  and  Adamic-Adar  do  not  satisfy  R-TCM 
as  they  ignore  edge  weights  and  hence  don’t  change  their  output  when  additional  item  occurrences  arc 
observed  in  existing  contexts.  Cosine  similarity.  Backward  Random  Walk,  and  Mean  Meeting  Time  satisfy 
R-TCM  conditionally  as  summarized  in  Table  6. 

4.5  Analysis  for  Diminishing  Returns 

Because  the  increase  in  similarity  due  to  every  subsequently  observed  co-occurrence  context  does  not  de¬ 
pend  on  the  total  number  of  shared  contexts  for  Common  Neighbors  and  Adamic-Adar  similarities,  they 
never  satisfy  the  diminishing  returns  axiom,  linearly  increasing  similarity  with  every  subsequent  shared 
occurrence.  Other  methods  satisfy  Diminishing  Returns  conditionally,  as  summarized  in  Table  7.  This 
indicates  that  in  data  streaming  domains  where  new  contexts  arc  continually  observed  (e.g.,  new  search 
query  sessions).  Common  Neighbors  and  Adamic-Adar  arc  not  appropriate,  while  other  methods  should  be 
monitored  to  ensure  that  similarity  values  grow  sublinearly  and  converge  as  data  is  continuously  aggregated. 

5  Random  Walk  with  Sink 

The  previous  section  demonstrates  that  no  similarity  function  under  consideration  satisfies  all  axioms  un¬ 
conditionally,  implying  that  they  may  exhibit  unintuitive,  degenerate  behavior.  A  natural  question  is,  can  we 
design  a  similarity  function  that  satisfies  all  axioms?  This  section  demonstrates  how  this  can  be  achieved  by 
a  regularized  valiant  of  random-walk  based  similarity,  which  shows  the  benefit  of  our  axiom  based  analysis. 


12 


Method 

DR 

CN 

Always  No 

COS 

2^+2 J2c^wqcwue  2 ^+T,c^wqcwuc  ^  f  , 

Vl|w9:|||+eVllw„,||l+e2  Vllw,:|l!+20Vllw„:|l!+202  Jwyq ’  ' 

JAC 

i  >  fw(q,u) 

PMI 

|r(?)|+|r(u)|+i  >  fw(q,u) 

AA 

Always  No 

FRW 

o  >  iSA  ( fw  (q,  u)  q^::d^  ) 

BRW 

0  >  (fw (q, u)  ) 

MMT 

3d2  +  ( Qqq  +  Quu)0  >  {2d2  -(-  3 {Qqq  +  Quu)@  +  Qqq  +  Quu  +  QqqQuu)  fw(q^  U) 

Table  7:  Summary  of  sufficient  and  necessary  conditions  for  similarity  functions  to  satisfy  Diminishing 
Returns.  6  is  the  weight  of  the  new  edges  as  described  in  Axiom  4. 


5.1  Main  Idea 


We  begin  by  observing  from  Table  1  that  Absolute  New  Context  Monotonicity  (A-NCM)  is  surprisingly 
not  satisfied  by  any  random-walk  based  methods,  while  intuition  suggests  that  observing  new,  exclusive 
co-occurrences  between  two  items  will  always  increase  the  visitation  probability  for  walks  between  them. 
While  the  appearance  of  a  new  shared  context  adds  a  new  visitation  path  with  associated  probability  mass 
(evidence  score),  it  also  triggers  re-normalization,  which  may  lead  to  decline  in  total  probability  of  reaching 
the  destination  via  other  co-occurrence  contexts. 

We  propose  to  remedy  this  issue  by  attaching  an  absorbing  ‘sink’  context  to  all  item  nodes,  effectively 
smoothing  the  associated  outgoing  context  probabilities.  The  weights  of  the  edges  between  item  nodes  and 
the  sink  node  arc  constant,  regardless  of  the  degree  of  the  item  nodes.  The  sink  context  does  not  contribute 
any  evidence  score  to  the  overall  similarity,  but  by  re-distributing  the  probability  mass  (evidence)  among  the 
co-occurrences,  it  ensures  that  addition  of  a  new  co-occurrence  does  not  remove  more  visitation  probability 
than  it  contributes. 

Formally,  our  proposed  Random  Walk  with  Sink  (RWS)  similarity  is  defined  as  the  visitation  probability 
of  a  forward  random  walk  originating  from  the  query  node  to  visit  the  target  node,  with  all  item  nodes  being 
observed  with  the  weight  s  in  an  absorbing  sink  context  csrnk-  That  is,  RWSw(q ,  u)  from  query  node  q  to 
target  node  u  based  on  the  weight  matrix  W  is: 

RWSw(q,u)  =  [Q'~1WD~1WT]qu  =  £  Yn  n“C’ 

“  S  +  Qqq  Dec 


where  Q'  is  an  adjusted  diagonal  degree  matrix  with  Q'u  =  s  +  W^. 

Let  us  see  how  the  addition  of  the  sink  node  with  the  fixed  edge  weight  solves  the  problem  of  FRW’s 
conditional  satisfiability  of  A-NCM.  In  RWS,  the  similarity  increase  by  the  addition  of  the  new  context 

and  the  similarity  decrease  of  existing  contexts  by  the  re- 


wu 


in  A-NCM  is  given  by  * q6 

S+Qqq+Wq£  Wgg+W, 

normalization  is  given  by 


w0, 


Wu 


S+Qqq+Wq  e  Dc 


is  larger  than  the  decrease,  which  leads  to  the  condition 


w„ 


.  Satisfying  A-NCM  requires  the  increase 

—  >  Y ,  ^_nW\n  •  Notice  that  by 

( s+Qqq)Dcc  j 


13 


increasing  s  enough,  and  thereby  decreasing  the  value  of  the  right  hand  term,  the  condition  can  be  satisfied. 
Thus,  RWS  can  be  parameterized  to  satisfy  A-NCM  by  imposing  enough  smoothing  on  the  random  walk 
similarity. 

We  show  a  working  example  of  RWS  in  Figure  1  which  contrasts  the  performance  of  Forward  Random 
Walk  and  RWS  on  the  DBLP  bibliography  dataset  2.  We  compute  the  similarity  between  authors  by  treating 
venues  in  which  they  publish  as  co-occurrence  contexts.  Figure  1  (a)  and  (b)  illustrate  the  effect  on  Forward 
Random  Walk  similarity  between  authors  ‘Jian  Pei’  and  ‘Ravi  Kumar’  from  the  addition  of  a  new  co¬ 
occurrence  context  ‘CIKM’.  The  increase  in  visitation  probability  through  this  context  is  insufficient  to 
make  up  for  the  decrease  in  probability  mass  going  through  the  other  co-occurrence  contexts  (‘VLDB’, 
‘SIGMOD’,  and  ‘KDD’).  Figure  1  (c)  and  (d)  show  that  introducing  the  sink  node  csirij.  with  a  sufficient 
smoothing  level  results  in  similarity  not  decreasing  when  a  new  shared  co-occurrence  is  observed. 

5.2  Analysis 

Analysis  of  axiom  satisfiability  by  RWS  is  performed  analogously  to  that  of  Forward  Random  Walk,  demon¬ 
strating  that  RWS  also  satisfies  R-NCM,  R-QCM,  A-TCM,  and  R-TCM  axioms.  For  axioms  A-NCM,  A- 
QCM,  and  DR,  setting  the  parameter  s  appropriately  allows  RWS  to  satisfy  them,  as  summarized  by  the 
following  results. 

Lemma  7  RWS  satisfies  A-NCM  if  and  only  if  s  >  —Qqq  +  uc  ^  ,g  Wq^Vuc . 

Proof  7  The  full  proof  is  provided  in  the  Appendix. 


Lemma  8  RWS  satisfies  A-QCM  if  and  only  if 
S  ^  j),„_wq£  (^9c(-^cc  T  e)  —  Qqq(Dcc  ~~  Wqc)  T 


(Dcc+t)Ddi  WqcWuc  \ 

WuS  Dcc  > 


Proof  8  The  full  proof  is  provided  in  the  Appendix. 


Lemma  9  RWS  satisfies  DR  if  and  only  if  s  >2  a  —  Qqq  where  a  =  Wqrjf^. 

Proof  9  The  full  proof  is  provided  in  the  Appendix. 

Thus,  the  addition  of  the  sink  node  enables  RWS  to  satisfy  all  axioms,  via  a  setting  of  the  parameter  s, 
as  shown  in  Lemmas  7~9. 

We  note  that  a  variant  of  Backward  Random  Walk  analogous  to  RWS  can  be  designed  by  adding  a  sink 
with  an  axiom-driven  parameterization.  We  also  point  out  that  RWS’  reliance  on  a  constant  smoothing 
parameter,  s,  is  a  key  distinction  from  PageRank-style  smoothing  of  random  walks,  where  the  edge  weight 
changes  in  proportion  to  the  current  degrees  of  the  node  to  make  the  random  jump  probability  constant.  It 
can  be  shown  that  PageRank-style  smoothing  does  not  lead  to  unconditional  axiom  satisfaction,  and  hence 
is  not  an  appropriate  strategy  for  designing  well-behaved  similarity  functions. 

2http : / /www . inf ormat ik .uni-trier.de/~ley/ db/ 
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6  Related  Work 

Similarity,  distance  and  distortion  measures  have  been  an  active  research  topic  for  several  decades  across 
many  areas  of  computer  science  and  mathematics,  and  this  paper  focuses  on  their  narrow  subset  that  has  high 
practical  significance:  co-occurrence-based  similarity.  Beyond  the  popular  similarity  functions  introduced 
in  Section  2,  number  of  other  measures  were  studied  in  the  context  of  link  prediction  in  social  networks  [11]. 
In  recent  work,  Sarkar  et  al.  performed  learning-theoretic  analysis  of  several  link  prediction  heuristics  under 
the  assumption  that  they  approximate  a  distance  metric  in  a  latent  space  [16].  Our  approach  avoids  relying  on 
metric  assumptions,  as  it  has  been  demonstrated  in  cognitive  psychology  literature  that  their  key  properties 
(minimality,  symmetry,  triangle  inequality)  are  routinely  violated  in  application  domains  [20,  12], 

This  paper’s  core  contribution  lies  in  developing  an  axiomatic  approach  for  analyzing  the  capacity  of 
various  similarity  methods  to  satisfy  properties  desired  of  them  during  continuous  co-occurrence  data  ag¬ 
gregation.  The  axiomatic  approach  has  previously  been  proven  particularly  fruitful  in  clustering  function 
analysis,  where  its  introduction  by  Kleinberg  [10]  was  been  followed  by  a  number  of  results  that  study  a 
variety  of  axiomatic  properties  for  different  clustering  methods  [4,  1].  The  axiomatic  approach  has  also  been 
studied  in  the  context  of  information  retrieval  where  it  was  employed  to  analyze  retrieval  models  [7,  8]. 

7  Conclusion 

In  this  paper,  we  propose  an  axiomatic  approach  to  analyzing  co-occurrence  similarity  functions.  The  main 
contributions  are  the  followings. 

1 .  Axiomatic  framework.  We  propose  a  principled  methodology  for  analyzing  co-occurrence  similarity 
functions  based  on  differential  response  to  new  observations. 

2.  Analysis  and  proofs.  We  perform  extensive  analysis  on  a  number  of  common  similarity  functions 
using  our  proposed  unifying  abstraction,  and  prove  that  there  exists  no  single  method  which  satisfies 
all  conditions  unconditionally. 

3.  Design  of  new  similarity  function.  We  demonstrate  how  axiomatic  analysis  allows  designing  a  new 
data-driven,  theoretically  well-justified  co-occurrence  similarity  function  without  degenerate  proper¬ 
ties. 

Future  research  directions  include  extending  the  axioms  that  capture  important  properties  of  other  simi¬ 
larity  functions  in  more  general  contexts. 
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A  Proofs 

In  this  section,  we  give  proofs  of  the  analysis  in  Table  1 . 

A.l  Proofs  for  New  Context  Monotonicity  (NCM) 

It  is  obvious  that  Common  Neighbors  satisfies  A-NCM  and  R-NCM  from  its  definition,  so  we  omit  the 
proofs.  We  prove  conditions  for  the  Cosine  similarity. 

Lemma  10  Cosine  similarity  satisfies  A-NCM  if  and  only  if  — wqcWuc  =  >  fw((l-  u). 

Proof  10 

fw(q,u)  >  fw(q,u ) 

WgcWuc  +  ^c^cWuc  .  ,  , 

7  —  !  >  fw{q,u ) 

sJ\\Wq-.\\l  +  W^\\Wu.\\l  +  Wl6 

□ 

Lemma  11  Cosine  similarity  satisfies  R-NCM  if  and  only  if  WqcWy^  >  fw{q,  u). 

Proof  11 

fw^u) 

WqcWuc +  EcWqcWuc 

slm^l  +  w^ww^l  +  w^ 

W,cWuc  +  Jt,W,cWuc 

An  equivalent  condition  to  the  last  line  is 

wqqw„e  +  j:cwqcwqq 

wqqWuc  +  j:cwqewqq 

Next,  we  prove  conditions  for  the  I  ace  aid  similarity. 

Lemma  12  Jaccard  similarity  satisfies  A-NCM  if  and  only  if  1  >  fw{q ,  u). 


\\Wq:\\if\\Wu:\\i  +  W^ 

>  fw(q,v) 

>  Ec  wqcwvc 

^\\Wq:\\l  +  W^\\Wm\\2 

Ec  wqcwvc 
\\wv:\\2 

^  Ec  wqcwvc 

ll^llall^lla 

>  fw(q,v ) 

□ 
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Proof  12 


fw(q,u)  >  fw(q,u ) 

|r(g)nr(u)|  +  i  |r(g)nr(u)| 

|r(g)  ur(«)|  + 1  |r(g)  ur(«)| 

|r(?)ur(«)|  >  |r(g)nr(«)| 

i  >  fw(q,u ) 


□ 


Lemma  13  Jaccard  similarity  always  satisfies  R-NCM. 

Proof  13  For  R-NCM  to  hold,  the  following  must  be  true  for  \/v  such  that 


fw^u)  >  fw^'v) 
|r(g)nr(u)|  +  i  |r(g)nr(^)| 
|r(g)  ur(«)|  +  i  |r(g)ur»|  +  i 


which  is  always  true  since 


i£(g)  n  r(u)|  + 1  \T(q)  n  r(u)|  |r(g)  n  i»|  jr (g)  n  i»| 

|L(g)  u  r(«)|  + 1  -  |r(g)  u  r(«)|  -  |r(g)  u r»|  |r(g)  u r(v)\  + 1 ' 


□ 


Next,  we  prove  conditions  for  the  Pointwise  Mutual  Information. 

Lemma  14  Pointwise  Mutual  Information  satisfies  A-N  CM  if  and  only  if  |r(q)|+|r(u)|+i  >  fw{q ,  u). 

Proof  14 


fw(q,  u)  >  fw(q,u ) 


|r(g)nr(n)|  +  i 
(lr0?)l  +  i)(|r(n)|  + 1) 
|r(g)||r(u)| 
|r(?)|  +  |r(«)|  + 1 

i 

Wio)\  +  |r(«)|  + 1 


^  |r(g)nr(n)| 
|r(g)||r(«)| 

>  |r(g)  nr(«)| 

>  fw(q,u ) 


□ 


Lemma  15  Pointwise  Mutual  Information  satisfies  R-NCM  if  and  only  if  p^y  >  fw(q ,  u). 
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Proof  15  For  R-NCM  to  hold,  the  following  must  be  true  for  \/v  such  that  fw{<h  u)  >  fw{(-T  v) 

fw(Q,u)  > 

|r(g)  nr(tt)|  + 1  |r(g)nr(^)| 

(|r(g)|  +  l)(|r(u)|  +  l)  >  \T(q)  +  i||r(u)| 
lr(«)|  >  |r(g)  nr(«)| 

>  fw(q,u ) 

□ 


Next,  we  prove  conditions  for  the  Forward  Random  Walk. 
Lemma  16  Forward  Random  Walk  satisfies  A-N CM  if  and  only  if - 
Proof  16  For  A-N  CM  to  hold,  the  following  must  be  true: 


wu& 


>  fw(q,u). 


fw(<Fu )  >  fw(q,u) 


w, 


qc 


Wu 


w, 


qc 


Wu 


Qqq  +  Wqc  Wqc  +  W i 


Qqq  +  Wq&  Wq£  +  W, 

+  E 


+  E 


Wf 


qc 


Wn 


W qcWuc 


C^c 


QqqDcc{Qqq  +  W 


qc) 


w, 


qc 


Wu 


Qqq  +  Wjwqc  +  Wr 


Qqq  F 

Wqc  Dec 

1 qq  ~  Q 

qq  —  Wgc) 

_  v 

W^ qcWuc  ^ 

c^c 

Qqq  DCc 

Wuc 

Wqc  +  Wuc 

Wuc 

>  yy  Wqc  Wuc 

C^C 


Q  qq  Dcc 


Wqd  +  Wu 


>  yy  WqcWuc 
c^c 


Qqq  DCc 


>  fw(q,u) 


which  holds  only  when  the  new  context  occurrence  weights  Wqq  and  Wuq  satisfy  the  last  inequality.  Thus, 
for  the  A-NCM  axiom  to  hold  for  forward  random  walks,  the  target  node’s  proportion  of  occurrences  in  the 
new  context  must  be  no  less  than  the  current  value  of  similarity.  □ 

Lemma  17  Forward  Random  Walk  satisfies  R-NCM  always. 

Proof  17  For  R-NCM  to  hold,  the  following  must  be  true  for  \/v  such  that  fw(q.  u )  >  fw(q ,  v), 

fw^u)  >  fw^v) 


W 


qc 


Wu 


w. 


qc 


Qqq  +  Wqc  Wq,  +  W, 

Wu,  V-  Q 


+  E 


W, 


qc 


Wn 


Qqq  +  Wqc  D cc 


Qqq  +  Wqc  Wqc  +  Wu£  Qqq  +  Wqc  Qqq  Dec 


+E 

C^c 

_ l_  Qqq 

Qqq  +  Wqc  Wqq  +  WUC  Qqq  +  W, 


qq 


Wqc  Wuc  -  Wv, 


>E 


>  0 


W, 


qc 


WVl 


Qqq  +  Wqc  D cc 


W, 


qc 


Wn 


■( fw(q,u )  -  fw(q,v ))  >  0 


'  qc 
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which  is  always  true  as  the  first  summand  is  positive  and  the  second  summand  is  nonnegative,  hence  the 
R-NCM  axiom  is  always  satisfied  for  Forward  Random  Walk  similarity.  □ 

For  the  Backward  Random  Walk  (BRW),  it  can  be  shown  that  BRW  satisfies  A-NCM  if  and  only  if 
-  Wq,Ci.  >  fwish  uf  using  a  derivation  very  similar  to  the  A-NCM  for  FRW.  Since  the  addition  of  a  new 

**  qc<  "tic 

context  doesn’t  affect  other  target  nodes  in  NCM  for  BRW,  the  condition  that  BRW  satisfies  R-NCM  is 
exactly  the  same  as  those  for  A-NCM. 

Next,  we  prove  conditions  for  the  Mean  Meeting  Time. 

Lemma  18  Mean  Meeting  Time  satisfies  A-NCM  if  and  only  ifWqcWUc  >  (A  —  1)  WqcWuc,  where 

^  (Qqq+Wq£)(Q  uu  +  Wuc) 

QqqQuu 

Proof  18  For  A-NCM  to  hold ,  the  following  must  be  true: 

fw^u)  >  fw(q,u) 

wq£  Wu£  |  y.  Wqc  Wuc  ^  y  Wqc  Wuc 
Qqq  T  W^qc  Quu  T  1 1  '  uc  ( .  /  ",  Qqq  T  ^  1  i/c  Quu  "F  1 1 '  uc  (.  /  Qqq  Quu 


Thus, 

Wq&Wu£  >  (A  -  1)  WqcWuc 

C^C 


□ 


Lemma  19  Mean  Meeting  Time  satisfies  R-NCM  if  and  only  ifWq£Wu£  >  Ylc^c  (lWqcWvc  ~  WqcWuc), 
where  7  =  . 

Wvv 

Proof  19  For  R-NCM  to  hold,  the  following  must  be  true  : 


W, 


qc 


Wu 


Qqq  T  ICgc  Q  UU  +  wu 


+ 


w, 


yy  qc 

Qqq  +  Wqc  QUu  +  Wu 

T  „ 

'  qc"  uc 


fw(Q’u)  >  fw^v) 

Wu, 


qcvr  uc 


Quu 


c=f^c 


Quu  W^U 


> 


> 


E 


w. 


qc 


WVl 


C^c 


Qqq  +  Wqc  Q 


E 

C^C 


WqcWvc 


Qv 


Thus, 

Wq£WUc  >  ^  (7 WqcWvc  -  WqcWuc) 

C^C 


□ 


Next,  we  prove  conditions  for  the  Random  Walk  with  Sink. 

Lemma  20  RWS  satisfies  A-NCM  if  and  only  if  s  >  —Qqq  +  ^  uc  Wqj^Vl 

W u£  /  cc 
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Proof  20  fw(q,u)  >  fw{q,u ) 


w„ 


_ wqc  Wuc  >  y'  wqc  Wuc 

S+Qqq+Wqc  Wq&+Wu g  ^  S+Qqq  +  Wq&  Wc  S+Qqq  Dcc 


+  E 


c¥1^  ( S-\-Qqq)DCc{'S-\-Qqq-\-Wq£ ) 


(s  T  Qqq  S  Qqq  fPgg)  ^ 


wu 


Hgg  (  ffni  _  V' 
s+Q„+W,a  V  H/96+Wua  ^ 

_ ''  air _  >  V  1 

n  v- u  „a  z- 


:qq 

WqcWuc  'j  y  Q 


S+Qqq+Wq£  Wqg+Wug 


C^C  (s  +  Qgg)-/}, 


( S-\-Qqq)Dcc 

The  lemma  is  proved  by  rearranging  terms  in  the  last  equation. 


A.2  Proofs  for  Query  Co-occurrence  Monotonicity  (QCM) 

It  is  obvious  that  Common  Neighbors  satisfies  A-QCM  and  R-QCM,  so  we  omit  the  proofs.  We  prove 
conditions  for  the  Cosine  similarity. 

Lemma  21  Cosine  similarity  satisfies  A-QCM  if  and  only  if  uc+J2c  11  qc^uc -  >  fw(q ,  u). 

Q\\Wq.\\l+e(e+2Wqa)\\Wu:\\2 

Proof  21  For  A-QCM  to  hold  for  the  Cosine  similarity,  the  following  must  be  true: 


>W 

(Wqc  +  e)Wu£  +  EcFcW1cWuc 
^\\Wq:\\22  +  e(e  +  2Wqc)\\WU:\\2 

eWuc  +  Ec  WqcWuc 

^\\Wq.\\l  +  e{e  +  2Wqc)\\Wu,\\2 


fw{q,u )  >  fw(q,u ) 


>  fw(q,u ) 

>  fw(q,u ) 


□ 


The  Jaccard  similarity,  the  Pointwise  Mutual  Information,  and  the  Adamic-Adar  similarity  do  not  satisfy 
A-QCM  nor  R-QCM  since  they  do  not  consider  the  edge  weights. 

Next,  we  prove  conditions  for  the  Forward  Random  Walk. 

Lemma  22  Forward  Random  Walk  satisfies  A-QCM  if  and  only  if  e  <  ~  -  D» 
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Proof  22  For  A-QCM  to  hold  for  the  Forward  Random  Walk,  the  following  must  be  true: 

fw^u)  >  fw(q,u) 

(Wq£  +  e)  Wu£  ^  Wqc  Wuc  Wq£  Wu£  x^  Wqc  Wuc 


Qqq  +  £  D££  +  £ 


+  E 


> 


+  E 


Qqq  €  Dcc  Qqq  Dqq  ^  Qqq  Dcc 


uci^Q qqD ccW qc  “t“  QqqDcc ^  ^^qc^QqqDcc  “1“  Qqq  H-  Dqq  c))) 


Qqq  Dec  ( Qqq  +  e)(-Dgc  +  e) 

Wu£Qqq(D££  -  Wq£) 


^2  WucWqcjQqq  ~  Qqq  ~  f)  >  ^ 


c/c 


QqqDCc(Qqq  H“ 


Qqq  ^  QqqDcc(Dcc  “I-  f)  Qqq  “1“  £  QqqD, 

qc  -  fw{q,u )  >  o 
WQW 


e  yx  Wqc  Wuc 

Qqq  ~h  £  Qqq  DCc 


>  0 


-Dec  -  ^ 


D££  D££  +  € 

,  1  /tt-t-  "  qc"  UC\  7-, 

£  <  - t(H4c - Q - )  -  Dr 


fw(q,u)y"  -Dec 

We  note  that  for  the  case  where  q  and  u  are  the  only  nodes  occurring  in  context  c,  and  therefore  D££  = 
Wq£  +  Wu£,  the  above  constraint  is  equivalent  to: 


e  < 


1 


w2 


fw(q,  'll)  Wqc  +  W, 


-  wu£  -  w 


qc 


□ 


Next,  we  prove  conditions  for  the  Mean  Meeting  Time. 

Lemma  23  Mean  Meeting  Time  satisfies  A-QCM  if  and  only  if 1  TLas. 
Proof  23  For  A-QCM  to  hold,  the  following  must  be  true: 

/w  (<?>“)  >  fw(q,u) 

v-  W, 

C^C 


Wqg+€  _  Wqc 
Qqq~ l“£  Qqq 


Wu&  .  e  XT'  Wqc  Wuc 

Q„„  ^  Q„r,+e  L~/c^c  Q„„  Q„„,  • 


Wq£  +  €  Wu 


Wqc  Wuc  .  Wq£  Wu£ 


Qqq  T  £  Quu  Qqq  +  £  Qi i 


_l_  yy  WjC  Wuc 

c^c 


Qqq  Quu  Qqq  Quu 


Wq£  +  €  Wc 


qc 


Qqq  T  £  Q 


QQ 


Wu 

Qm 


> 


y~*  Wqc  wuc 

c^c 


QQQ  +  e~f'.  Qqq  Q  uu 


□ 


Lemma  24  Mean  Meeting  Time  satisfies  R-QCM  if  and  only  if  e  >  —Wq£  +  ^r^Q^c^c  W?c(^£  —  ^£)). 

Proof  24  For  A-QCM  to  hold,  the  following  must  be  true: 

fw^w)  >  fw(<Tv) 

yy  Wqc  Wuc  >  y-x  Wqc  W„c 

^  Qqq  +  £  Q  UU  ,  c  Qqq  +  £  Q  vv 

c^c  ^  c^c  ^ 

Qc  WyC  % 


Wqc  +  £  Wu 
Qqq  T  £  Q  m 


{ TTT  ,  X  WtC  .  TIC  /W 

(Wjc  +  f)  ~Q -  >  2^  WqC( 


uu  Qvv  Qv 

c^c 


£  >  -  w,c + ^r;  e  w /W"c 


W 


gel 


c^c 


Qvv  Q 


)) 
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□ 


Next,  we  show  that  RWS  satisfies  A-QCM  with  an  appropriate  setting  of  the  s  parameter. 


Lemma  25  RWS  satisfies  A-QCM  if  and  only  if 
s  >  Dgg— Wq£  (Wq^D#  +  e)  Qqq{Dcc  ~  VPgc)  T 


(Dcc+ADcc  sr 
Wu&  Z^C^C 


wqcwuc  x 

Dec  >  ' 


Proof  25 


fw ((fu)  >  fw(q,u ) 

Wqc  +  e  WUc  \  ^  Wqc  Wuc  ^  Wqc  Wu£  \  ^  Vhgc  Wuc 

s  +  Qqq  +  e  ^cc  +  £  c_^.  S  +  Qqq  +  £  L>cc  S  +  Qgg  Dgc  c_^.  S  +  Qqq  Dcc 

The  lemma  is  proved  by  rearranging  terms  in  the  last  equation. 


□ 


A.3  Proofs  for  Target  Co-occurrence  Monotonicity  (TCM) 

It  is  obvious  that  Common  Neighbors  satisfies  A-TCM  and  R-TCM,  so  we  omit  the  proofs.  We  prove 
conditions  for  the  Cosine  similarity. 

Lemma  26  Cosine  similarity  satisfies  A-TCM  if  and  only  if - qc+J2c  P  iW  uc  =  >  fw(q  u). 

\\Wq,\\2Q\\WU:\\l+e{e+2Wu&) 

Proof  26  For  A-TCM  to  hold  for  cosine,  the  following  must  be  true: 


At  (<?,«)  >  fw(q,u) 

(Wu&  +  e)Wqt  +  Ec^WqcWuc 

\\Wq.\\2s/\\Wu:\\i  +  e(e  +  2Wut)  > 
eWq£  +  ^cWqcWuc 

\\Wq,\\2^\\Wu-\\l  +  e(e  +  2Wu£)  W{  ’ 


□ 


Since  the  increase  of  the  weight  of  the  target  edge  doesn’t  affect  other  target  nodes’  Cosine  similarities, 
the  condition  that  the  Cosine  similarity  satisfies  R-TCM  is  exactly  the  same  as  those  for  A-TCM. 

The  Jaccard  similarity,  the  Pointwise  Mutual  Information,  and  the  Adamic-Adar  similarity  do  not  satisfy 
A-TCM  nor  R-TCM  since  they  do  not  consider  the  edge  weights. 

Next,  we  prove  conditions  for  the  Backward  Random  Walk. 

Lemma  27  Backward  Random  Walk  satisfies  A-TCM  if  and  only  if  e  <  j^jyq(Wqc  ~  -%—2£i)  —  Dec- 
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Proof  27  For  A-T CM  to  hold  for  the  Backward  Random  Walk,  the  following  must  be  true: 


fw(<Fu)  >  fw(q,u ) 
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(WEc - )  -  Dr 


fw(q,u) 


D, 


For  the  case  where  q  and  u  are  the  only  nodes  occurring  in  context  c,  and  therefore  D&  =  Wu£ 
the  above  constraint  is  equivalent  to: 


W, 


qc> 


l 


w2~ 

qc 


e  <  7 7 - T  TT~r — ffirt - Wqc  -  Wu 

fw(q^  tt)  Wuc  F  Wqc 


□ 


Since  the  addition  of  a  weight  to  the  target  context  doesn’t  affect  other  target  nodes  in  TCM  for  BRW, 
the  condition  that  BRW  satisfies  R-TCM  is  exactly  the  same  of  those  for  A-TCM. 

Next,  we  prove  conditions  for  the  Mean  Meeting  Time. 


Lemma  28  Mean  Meeting  Time  satisfies  A-TCM  if  and  only  if 
Proof  28  For  A-TCM  to  hold,  the  following  must  be  true: 
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qc 
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□ 


Lemma  29  Mean  Meeting  Time  satisfies  R-T CM  if  and  only  if  e  ( Wqf, — E^c/c  Wqc  )  >  J2cA  c  Wqc(WvQ^uu  — 

Wuc).  . ’ 
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Proof  29  For  R-TCM  to  hold,  the  following  must  be  true: 


fw(<hu)  >  fw^'V) 
Wqc  WUc  +  C  ^  yy  Wqc  Wuc 
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Qqq  Quu  +  €  Qqq  Quu  +  e  Ef,  Qqq  Qvv 
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C^C 
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Qvv 

w, 
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vv 
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VV 


□ 


A.4  Proofs  for  Diminishing  Returns  (DR) 


It  is  obvious  that  Common  Neighbors  does  not  satisfy  the  Diminishing  Returns  (DR).  We  prove  conditions 
for  the  Cosine  similarity. 


Lemma  30  Cosine  Similarity  satisfies  DR  if  and  only  if 
fw(q,u). 


2fl2+2E^c,c^VWuc 


™2+Y,^&,cWqcWuc 

s/\\Wq-.\\l+2e2^/\\Wu.\\l+262 


> 


Proof  30  For  DR  to  hold,  the  following  must  be  true: 


2<?2  +  2  Ec^a^cWu 


Ec^cWqcW, 


fw(q,  u ) _  fw(q,  u)  >  fw(q,  u)  -  ffr(q,  u) 
2fw(q,  u)  >  fw(q ,  u)  +  fw(q,  u) 

2 02  +  Ectc,cWqcWuc 


y/\\Wq:\\l  +  e2y/\\Wu:\\l  +  62  >  ||W,:||2||W„:||2  +  y/\ \ Wq:  |  |f  +  20‘ V|  |' Wu:  |  %  +  202 

_ . 

\/||W,:|||  +  0Vl|W«:|||  +  «2  ^11^112  +  2^711^:111  +  202  1 1  W,  1 12  1 1  Wu:  1 12 


202  +  2  Ec^WqcWu 


202  +  Zc^c,cWqcWu 


282  +  2  Ec+c,c  WqCWuc 


2 02  +  Ec^c,cWqcWuc 


EWq-Wi  +  e^Wu-Wl  +  o'2  7||il9:||2  +  202^||icu:||2  +  202 


>  fw(q,u ) 


□ 


Next,  we  prove  conditions  for  the  Jaccard  Similarity. 

Lemma  31  Jaccard  Similarity  satisfies  DR  if  and  only  if  1  >  f\v(q.  u). 
Proof  31  For  DR  to  hold,  the  following  must  be  true: 


|r(g)  n  r(u)|  + 1  |r(g)  n  r(u)|  \r(q)  n  i»|  +  2  |r(g)  n  r(u)|  + 1 

|r(g)ur(n)|  +  i  |r(g)  ur(«)|  |r(g)ur(«)|  +  2  |r(g)ur(«)|  +  i 

\T(q)  U  I»|  -  |r(g)  n  r(u)|  \T(q)  U  T(u)|  -  jT(g)  d  I»| 

|r(9)  U  r(«)|(|r(g)  u  r(«)|  + 1)  (\r(q)  u  r(«)|  +  2)(|r(g)  u  r(«)|  + 1) 

which  is  true,  since  the  denominator  of  the  left  side  is  smaller  than  the  right  side.  □ 
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Next,  we  prove  conditions  for  the  Pointwise  Mutual  Information. 

Lemma  32  Pointwise  Mutual  Information  satisfies  DR  if  and  only  if  |r^|+|p^|+1  >  fw(,Q ,  u). 
Proof  32  For  DR  to  hold,  the  following  must  be  true: 


|r(g)  nr(«)|  + 1  |r(g)nr(«)|  ^  |r(g)nr(«)|  +  2  |r(g)nr(«)|  +  i 

(lr(g)|  +  i)(|r(«)|  + 1)  |r(g)||r(«)|  >  (|r(g)|  +  2)(|r(u)|  +  2)  (ir^i  +  i)(|r(u)|  + 1)’ 
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(|r(g)|  +  i)(|r(«)|  +  i)|r(g)||r(«)| 
|r(g)||r(rz)|  -  |r(g)  n  r(n)|(|r(g)|  +  \r(u)\  + 1)  -  2|r(g)  n  r(u)|  -  2 
(|r(g)|  +  2)(|r(u)|  +  2)(|r(g)|  +  i)(|r(«)|  +  i) 

|r(g)||r(u)|  -  \T(q)  n  r(tt)| (|r(g)|  +  |r(n)|  + 1) 
|r(9)||r(u)| 

jjMHMI  -  <r(g)  n  rNI(|r(g)|  +  \r(u)\  + 1)  -  2(|r(g)  n  i»|  + 1) 

(|r(g)|+2)(|r(u)|  +  2) 


which  is  true,  since  |r(g)||r(/u)|  <  (|r(g)|  +  2) (|r(it) |  +  2)  and  |T(g)  n  r(u)|  +  1  >  0. 

□ 

Next,  we  prove  conditions  for  the  Forward  Random  Walk. 

Lemma  33  Forward  Random  Walk  satisfies  DR  if  and  only  if  6  >  ( fw(q , u)  ~  2J>-C.D~  ^  )• 

Proof  33  For  DR  to  hold,  the  following  must  be  true: 


262  1 


+  E 

c^c,c 

2  1 


2W,C  Wuc 


> 


e2 


Qqq  $  Dec  -  Qqq  “1“  @  Dcc  Qqq  20  Dqq  Qqq  20  Dqc  _  ,  ~  ~  cc 


l 


l 


l 


+ 

l 


fw^u)  ~  fw(q,u )  >  fw(q,v)  -  fw(q^u) 

2/^(9,  u)  >  fw(q,  u)  +  fw(q,  u) 
02  1  ^Wqcwucj_  1  , 


c^c,c 


Dr 


Qqq  Qqq  F  20 
f  2  . 


Qqq  +  0  Dec  Qqq  +  20  Dec  Qqq  +  2 9  Dg 


tqq  '  cc  ^cqq  . - 1*.  c,a~ 

2  {3Dcc  —  Dcc)0  +  ( Dec  —  D cf)Qqq 


y'  WqcWuc  /I 

Vcc  Qqq  Qqq  "F  26  Qqq  +  0 ' 


2d'2 


y,  WqcWuc 

DccDcciQqq  +  0){Qqq  +  20)  ff~r/r  ^CC  QqqiQqq  F  ®){Qqq  F  26) 


cc  ^cc\n  ,  Qqq{DCc  D  re) 


,3D-cc  -  Dec, n 
( - cc - + 


2  D-ccD, 

0  > 


cc-^cc 

2  Dec  Dr 


3DCc  Dec 


2  D-crD^ 

(fw{q,u )  -  ®qq 


>  fw(q,u ) 


cc-^cc 

Qqq{Dcc  Dec)  - 


2DccDcc 


□ 


Next,  we  prove  conditions  for  the  Backward  Random  Walk. 
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Lemma  34  Backward  Random  Walk  satisfies  DR  if  and  only  ifd>  yjy&zyfr,  ( fw(q , u)  ~  )• 

Proof  34  For  DR  to  hold,  the  following  must  be  true: 


262  1 
Quu  +  0  D  Cl 


+  £ 


2  Wuc  Wqc 


> 


e2 


i 


+ 


c^c,c 

1 


Quu  +  0  Dcc  Quu  +  20  Dec  Quu  +  20  D, 


fw («> u)  -  fw(q,  u)  >  fw(q , «)  -  At (?,  u) 
2/^(9, «)  >  fw(q,  u)  +  fw(q,  u) 
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'  /  V  7“4  V  yO  ' 


C^C,C 


Dr 


1 
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1 


QuU  Quu  20 

1  2 


_ L)  >  V"  WucWqc  1  _ , 

Quu  0  D^£  Quu  +  20  Dec  Quu  +  20  -D^c  ,  „  ~  Dcc  Quu  Quu  +  20  Q  uu  +  eJ 


<+c,c 

g2  (3-Dcc  —  Dec)6  +  (Dec  —  D(f)Quu  ^  W^tcWgc 


202 


DccDcciQuu  +  6)(Quu  +  20)  “E  Dcc  Qrnz(Qim  +  0)(Qtm  +  20) 

c^c,c 
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□ 


Next,  we  prove  conditions  for  the  Mean  Meeting  Time. 

Lemma  35  Moon  Meeting  Time  satisfies  DR  if  and  only  if302  +  (QOT  +  Quu)0  >  (2 02  +  3(Qg(?  +  Quu)@  + 
Qqq  +  Quu  QqqQuu  )fw{q,u). 

Proof  35  For  DF  to  hold,  the  following  must  be  true: 


fw^u)  ~  fw(q,u)  >  fy{q,u)  -  fw(q,u), 
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Wu 
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0 
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□ 
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Finally,  we  show  that  RWS  satisfies  Diminishing  Return(DR)  axiom. 


Lemma  36  RWS  satisfies  DR  if  and  only  if  s  >2  a  —  Qqq  where  a  =  Wqc  l/y'c . 
Proof  36  From  the  definition  of  RWS  in  Section  5, 


RWSw{q,u )  = 

RWSw(q,u)  = 
RWSw(q,u)  = 


a 


S  Qqq 
a  + 1 

S  +  Qqq  +  6 

ot  W  0 

S  +  Qqq  +  2  6 


DR  is  satisfied  if  and  only  if 


RWSw(q ,  u)  -  RWSw(q ,  u)  >  RWSw(q,  u )  -  RWSw(q,  u) 
2  ct  "F  0  ck  o;  ~|-  0 


> 


+ 


S  +  Qqq  W  0  S  +  s  "F  Qqq  +  20 

The  lemma  follows  immediately  by  substituting  and  rearranging  terms. 


□ 
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